8 research outputs found

    Strain-aware assembly of genomes from mixed samples using flow variation graphs

    Get PDF
    The goal of strain-aware genome assembly is to reconstruct all individual haplotypes from a mixed sample at the strain level and to provide abundance estimates for the strains

    A high-quality human reference panel reveals the complexity and distribution of genomic structural variants

    Get PDF
    Structural variation (SV) represents a major source of differences between individual human genomes and has been linked to disease phenotypes. However, the majority of studies provide neither a global view of the full spectrum of these variants nor integrate them into reference panels of genetic variation. Here, we analyse whole genome sequencing data of 769 individuals from 250 Dutch families, and provide a haplotype-resolved map of 1.9 million genome variants across 9 different variant classes, including novel forms of complex indels, and retrotransposition-mediated insertions of mobile elements and processed RNAs. A large proportion are previously under reported variants sized between 21 and 100 bp. We detect 4 megabases of novel sequence, encoding 11 new transcripts. Finally, we show 191 known, trait-associated SNPs to be in strong linkage disequilibrium with SVs and demonstrate that our panel facilitates accurate imputation of SVs in unrelated individuals

    Computational pan-genomics: Status, promises and challenges

    Get PDF
    Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different Computational methods and paradigms are needed.We will witness the rapid extension of Computational pan-genomics, a new sub-area of research in Computational biology. In this article, we generalize existing definitions and understand a pangenome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a Computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations

    VG-flow: haplotype reconstruction and abundance estimation from mixed samples using flow variation graphs

    No full text
    The goal of haplotype-aware genome assembly is to reconstruct all individual haplotypes from a mixed sample and to provide corresponding abundance estimates. VG-flow provides a reference-genome-independent solution based on the construction of a flow variation graph, capturing all quasispecies diversity present in the sample. VG-Flow computes a solution to the contig abundance estimation problem and employs a greedy algorithm to efficiently build full-length haplotypes. Finally, VG-Flow calculates accurate frequency estimates for the reconstructed haplotypes through linear programming techniques
    corecore